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DETAILED ACTION 

Response to Amendment 

1 . In response to the Office Action mailed August 6, 2008, applicant submitted an 
amendment filed on December 5, 2008 in which the applicant amended and requested 
reconsideration. 

Response to Arguments 

2. Applicant has amended the claims to include limitations that require further 
consideration of the art. In particular, Applicant has amended the claims to include 
differentiating between the first speaker and the second speaker by associating speech 
received on the first channel with the first speaker and associating speech received on 
the second channel with the second speaker and a speech and wherein the first 
speaker is employed as the reference speaker based on the quality of the first channel 
being higher than the quality of the second channel. 

Peterson discloses a call center wherein a full transcript is generated of the agent 
and a customer dialog (column 18, line 61 - column 19, line 14). Peterson also teaches 
a topic detector and name-entity detector that are trained with sets of calls to better 
determine what was uttered by the speaker (column 20, line 57 - column 21 , line 10), 
which implies that speech is recognized partially based on the recognition of the 
interaction. Furthermore, Peterson teaches that speakers are on different channels 
(column 1 3, lines 44-46 and column 1 1 , lines 1 6-23), which is well known in the art of 
communication, however, does not specifically teach differentiating between the first 
speaker and the second speaker by associating speech received on the first channel 
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with the first speaker and associating speech received on the second channel with the 
second speaker and a speech and wherein the first speaker is employed as the 
reference speaker based on the quality of the first channel being higher than the quality 
of the second channel. 

Bartosik teaches a speech recognition device wherein the reference speakers 
use a microphone that causes the least distortion (paragraphs 0002, 0038, 0041), to 
improve the recognition. 

Popovici discloses language models using dialogue predictions. To interpret a 
new utterance in on-going interaction, the dialogue module takes into account the 

linguistic history and the active focus The DM makes use of pragmatic 

expectations about what the user would probably say in the certain dialogue state 
(pages 815-816, section 2), to improve speech recognition and speech understanding. 
Therefore, Applicant's arguments are persuasive, but are moot for reasons set forth 
above. 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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4. Claims 1, 4, 6-21, 24 and 26-44 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Peterson et al. (USPN 6,922,466), hereinafter referenced as 
Peterson in view of Bartosik and in further view of Popovici et al. (Specialized language 
models using dialogue predictions), hereinafter referenced as Popovici. 

Regarding claims 1 and 21, Peterson discloses a speech data mining (mining; 
column 1 9, line 59 - column 20, line 21 and column 40, line 57 - column 41 , line 4) 
system and method, hereinafter referenced as a system for use in generating a rich 
transcription having utility in call center management, comprising: 

a speech differentiation module (speaker change detector; column 20, lines 3-57) 
adapted to receive speech input from the first speaker on a first channel (telephone 
lines), to receive speech input from the second speaker on a second channel, and to 
differentiate between the first speaker and the second speaker (telephone handset; 
column 1 1 , lines 15-43) by identifying speech of the first speaker with speech received 
on the first channel, and identifying speech of the second speaker with speech received 
on the second channel (caller and live agent; column 22, lines 15-25 with column 20, 
lines 42-50); 

a speech recognition module (speech recognizer) improving automatic 
recognition of speech of a second speaker based on interaction of the second speaker 
with a first speaker preferentially employed as a reference speaker (column 20, lines 3- 
57); and 
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a transcript generation module (annotations) generating a rich transcript based at 
least in part on recognized speech of the second speaker (column 8, lines 1-6 and 
column 18, line 61 - column 19, line 14 with column 34, lines 46-59 and column 40, line 
57 - column 41 , line 4), but does not specifically teach differentiating between the first 
speaker and the second speaker by associating speech received on the first channel 
with the first speaker and associating speech received on the second channel with the 
second speaker and a speech and wherein the first speaker is employed as the 
reference speaker based on the quality of the first channel being higher than the quality 
of the second channel. 

Bartosik teaches a speech recognition device wherein the reference speakers 
use a microphone that causes the least distortion and associating speech received on 
the first channel with the first speaker and associating speech received on the second 
channel with the second speaker (paragraphs 0002, 0038, 0041), to improve the 
recognition. 

Therefore, it would have been obvious to one of ordinary skill of the art at the 
time the invention was made to modify Peterson's system as described above, to 
improve speech recognition and to obtain an accurate result (paragraph 0002), as 
taught by Bartosik. 

Petereson in view of Bartosik discloses a speech data mining system, but does 
not specifically teach a transcript generation module generating a rich transcript based 
at least in part on recognized speech of the second speaker recognized by the speech 
recognition module. 
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Popovici discloses language models using dialogue predictions. To interpret a 
new utterance in on-going interaction, the dialogue module takes into account the 

linguistic history and the active focus The DM makes use of pragmatic 

expectations about what the user would probably say in the certain dialogue state 
(pages 815-816, section 2), to improve speech recognition and speech understanding. 

Therefore, it would have been obvious to one of ordinary skill of the art at the 
time the invention was made to modify Peterson in view of Bartosik's method as 
described above, to give better results and to improve speech recognition and speech 
understanding as taught by Popovici. 

Regarding claims 4 and 24, Peterson discloses a data mining system wherein 
said speech recognition module is adapted to employ the first speaker (first speaker) as 
the reference speaker based on availability of a speech model (speaker model) adapted 
to the first speaker (column 20, lines 42-50). 

Regarding claims 6 and 26, Peterson discloses a system wherein said speech 
recognition module is adapted to identify a topic with respect to which the speakers are 
interacting (topic detector), and to employ a focused language model (statistical model) 
to assist in speech recognition based on the topic (column 20, line 42 - column 21 , line 
19). 

Regarding claims 7 and 27, Peterson discloses a system wherein said speech 
recognition module is adapted to receive an explicit topic selection from one of the 
speakers (topic; column 20, line 42 - column 21 , line 1 9). 
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Regarding claims 8 and 28, Peterson discloses a system wherein said speech 
recognition module is adapted to prompt a speaker corresponding to a call center 
customer to explicitly select one of a plurality of predetermined topics by pressing a 
corresponding button of a telephone keypad (touch-tone; column 14, lines 49-66). 

Regarding claims 9 and 29, Peterson discloses a system wherein said speech 
recognition module is adapted to identify a predetermined topic associated with an 
electronic form selected by call center personnel (call center's; column 14, lines 49-66). 

Regarding claims 10 and 30, Peterson discloses a system wherein said speech 
recognition module is adapted to extract at least one keyword from a speech recognition 
result of at least one of the interacting speakers, and to identify a predetermined topic 
based on the keyword (topic detector; column 20, line 42 - column 21 , line 10). 

Regarding claims 11 and 31, Peterson discloses a system wherein said speech 
recognition module is adapted to extract context from a speech recognition result of the 
first speaker, and to employ the context extracted from the speech recognition result of 
the first speaker as context in a language model (statistical models) utilized to assist in 
recognizing speech of the second speaker (column 20, line 3 - column 21, line 19). 

Regarding claims 12 and 32, Peterson discloses a system wherein said speech 
recognition module is adapted to extract at least one keyword from a speech recognition 
result of the first speaker (word), and to supplement a constraint list (stock) used in 
recognizing speech of the second speaker based on the keyword extracted from the 
speech recognition result of the first speaker (column 20, line 58 - column 21 , line 1 0). 



Application/Control Number: 10/616,006 Page 8 

Art Unit: 2626 

Regarding claims 13 and 33, Peterson discloses a system wherein said speech 
recognition module is adapted to extract at least one keyword from a speech recognition 
result of the first speaker (word), and to rescore recognition candidates generated 
during recognition of speech of the second speaker based on the keyword extracted 
from the speech recognition result of the first speaker (column 20, line 58 - column 21 , 
line 10). 

Regarding claims 14 and 34, Peterson discloses a system wherein said speech 
recognition module is adapted to detect interruption of speech of one speaker by 
speech of another speaker (speaker change detector), and to employ the interruption as 
context in a language model (statistical model) utilized to assist in recognizing speech of 
the second speaker (column 20, line 2 - column 21 , line 1 9). 

Regarding claims 15 and 35, Peterson discloses a system wherein said speech 
recognition module is adapted to detect an interruption of speech of one speaker by 
speech of another speaker (speaker change detector), and to record an instance of the 
interruption as mined speech data (mining; column 20, line 2 - column 21, line 19). 

Regarding claims 16 and 36, Peterson discloses a system wherein said speech 
recognition module is adapted to extract at least one keyword from a speech recognition 
result of at least one of the interacting speakers, to identify a frustration phrase 
associated with the keyword, and to record an instance of the frustration phrase as 
mined speech data (frustration; column 21, lines 10-19). 

Regarding claims 17 and 37, Peterson discloses a system wherein said speech 
recognition module is adapted to extract at least one keyword from a speech recognition 
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result of at least one of the interacting speakers, to identify a polity expression 
associated with the keyword (polite), and to record an instance of the polity expression 
as mined speech data (column 21 , lines 10-19). 

Regarding claims 18 and 38, Peterson discloses a system wherein said speech 
recognition module is adapted to extract at least one keyword from a speech recognition 
result of at least one of the interacting speakers, to identify a context corresponding to 
at least one of a topic (topic), complaint, solution, and resolution associated with the 
keyword, and to record an instance of the context as mined speech data (column 21, 
lines 10-19). 

Regarding claims 19 and 39, Peterson discloses a system wherein said speech 
recognition module is adapted to identify a number of interaction turns based on a shift 
in interaction from speaker to speaker (marks speakers turn), and to record the number 
of turns as mined speech data (column 20, lines 3-51). 

Regarding claims 20 and 40, Peterson discloses a system comprising a quality 
management subsystem employing mined speech data as feedback to at least one of a 
call center quality management process and a consumptible quality management 
process (quality; column 1 1 , line 57 - column 1 2, line 42 and column 1 9, line 24 - 
column 20, line 21). 

Regarding claim 41, Peterson discloses a system wherein said speech 
recognition module is adapted to employ an interactive focused language model in 
which yes/no questions relate to context of at least one of preceding or subsequent 
speech of another interacting speaker (was the call resolved; column 19, lines 3-14). 
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Regarding claim 42, Peterson discloses a system wherein said speech 
recognition module improves automatic recognition of the speech of the second speaker 
by employing previous and subsequent (tabulating) and recognized words of the 
speaker in addition to context of previous and subsequent interactions (interaction) with 
the referenced speaker (column 2, line 54 - column 3, line 4). 

Regarding claims 43 and 44, it is interpreted and rejected for similar reasons 
as set forth in claims 1 and 21 . In addition Popovici discloses a system wherein the 
said speech recognition module improves automatic recognition of the speech of the 
second speaker by determining a reliability of the recognized speech and, based on the 
reliability of the recognized speech, doing at least one of the following: confirming the 
recognized speech (confirm; page 816, section 2), highlighting the recognized speech in 
the transcript, attempt to recognize the speech again, replace the recognized speech 
based on another recognition attempt 

5. Claims 5 and 25 are rejected under 35 U.S.C. 1 03(a) as being unpatentable 
over Peterson in view of Bartosik and Popovici and in further view of Liu et al. (PGPUB 
2004/0204939), hereinafter referenced as Liu. 

Regarding claims 5 and 25, Peterson in view of Bartosik and Popovici disclose a 
data mining system, but does not specifically teach a system wherein speech 
differentiation module is adapted to use speech biometric. 

Liu discloses a system and method for speaker change detection comprising: 
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use a speech biometric trained on speech of the first speaker to distinguish 
between speech of the first speaker and speech of another speaker (biometrics 
mechanism; paragraph 0041), to improve speaker change detection. 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time the invention was made to modify Peterson's system wherein speech 
differentiation module is adapted to use speech biometric, as taught by Liu, to provide 
fast speaker boundary detection (column 1 , paragraphs 001 0-001 1 ). 

Conclusion 

6. Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See M PEP 
§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 
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7. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to JAKIEDA R. JACKSON whose telephone number is 
(571)272-7619. The examiner can normally be reached on Monday-Friday from 
5:30am-2:00pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on 571-272-7843. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/David R Hudspeth/ 

Supervisory Patent Examiner, Art Unit 2626 

/Jakieda R Jackson/ 
Examiner, Art Unit 2626 
January 8, 2008 



